NGINX Powers 12 Billion Iransactions 
per day at Capital One 
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* Beagle Bone Black (5v power) 
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e The prototype was able to 
sustain: 


e 625 TPS for HTTP 
e 335 TPS for HTTPS 
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Nginx PR: Support for mutual SSL 


authentication for upstream https proxy 
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What Technologies did we select for the 
build and why? 


CapitalOne 


Why NGINX? 


* Light Weight : - 10 Mb memory foot print, low CPU Usage 


* Concurrent : Supports 100K+ connections 
Webserver Usage 


* High performance web server written in C 


- E Apache 
- Microsoft-IIS 
e Async IO : Eee 
e Event Driven “ 
* Pluggable architecture 
* Native binding to Lua » 
e Architectural details 


E an 
Usage of web servers for websites, 27 Jan 2017, W3Techs.com 


— 4 


CapitalOne 


MASTER PROCESS 


Child Processes 


Shared memory is used for cache, session persistence, rate limits, session log 


CM CL W W W 


Cache Manager Cache Loader Worker processes handle HTTP 
and other network traffic 
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The master process performs the privileged operations 
such as reading configuration and binding to ports, and 
then creates a small number of child processes (the 
next three types). 


The cache loader process runs at startup to load the 
disk-based cache into memory, and then exits. 


The cache manager process runs periodically and 
prunes entries from the disk caches to keep them within 
the configured sizes. 


The worker processes do all of the work! They handle 
network connections, read and write content to disk, 
and communicate with upstream servers. 


How NGINX works 


NGINX uses a Non-Blocking "Event-Driven" architecture 


sten Sockets & Connection Sockets 


Wait for an event (epoll or kqueue) 


Event on Listen Socket: Event on Connection Socket: 
accept EJ new BI data in read buffer? read E 
set A to be non-blocking space in write buffer? write E 


add to the socket list error or timeout? close E 
& remove & from rer list 


An NGINX worker can process hundreds of thousands 
of active connections at the same time 
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Update configuration on disk 
SIGHUP master or nginx -s reload 


Master starts new 
worker processes with 
new configurations 


NGINX keeps on running 
with new configuration, and 
no interruption in service 


Old worker processes complete 
existing transactions and then 
exit gracefully 


NGINX is fantastic at Scaling and Handling Concurrent Connections 
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Lua Language 


e Lua is a powerful, efficient, lightweight, embeddable scripting language. 

e |t supports procedural programming, object-oriented programming, functional 
programming, data-driven programming, and data description Brief description or 
definition of topic or project 

e "Lua" (pronounced LOO-ah) means "Moon" in Portuguese 


e Lua is designed, implemented, and maintained by a team at PUC-Hio, the 
Pontifical Catholic University of Rio de Janeiro in Brazil 

e Lua is distributed in a small package and builds out-of-the-box in all platforms 
that have a standard C compiler and supports embedded to IBM Mainframe 
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Lua/ NGINX (OpenResty) Offers flexibility of Lua with the power of statically compiled C++ 


e LuaJITBinding:ZeroMQBinding (http:// 
zeromq.org/bindings:lua ) 
e LuaJIT2: 
— mean throughput: 6,160,911 [msg/s] 
— mean throughput: 1478.619 [Mb/s] 


e C++code: 
— mean throughput: 6,241,452 [msg/s] 
— mean throughput: 1497.948 [Mb/s] 
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Compatibility 


Overview 


The Lua JIT (Just In Time) Compiler Ensures that our code runs fast 


Interactive Performance Comparison Chart 


ww Click to compare v Mode 

= TERET = m Select two VMs to compare. Click to the left for the 1st 

- Anl _ one and to the right for the 2nd one. Then take a look 
LuaJIT 1.1.6 -O =» x86 at the results below. 


LuaJIT 2.0.0 (interpreter) = = E.g. a ratio of 49.71 means the 2nd VM runs that 
LuaJIT 2.0.0 benchmark almost fifty times faster. Please note the 
—— bar graph has /ogarithrnic scale. 

Sion ee ciiin — = Choose different VMs to compare or compare the same 
LuaJIT 2.0.0 (interpreter) = x64 VM on x86 and x64. 


(^otmXurTzO0 5 o - Click on the arrows next to Benchmark or Ratio to sort 


by these columns. 
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partialsums 1e7 3.73 PR 

chameneos 1e7 2.55 rm 

series 10000 2.16 

sum-file 5000 1.56 = 
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Leading Lua Users 


World of Warcraft : Multiplayer game 

Adobe Lightroom: environment for the art and craft of digital photography 

LEGO Mindstroms NXT 

Space Shuttle Hazardous Gas Detection System: ASRC Aerospace, Kennedy Space Center 
Barracuda Embedded Web Server 

Wireshark : Network protocol analyzer 

Asterisk: Telecom PBX 

Radvision SIP Stack 

Redis, RocksDB(Facebook), Tarantool and many other DBs 


Capital One: 

e Capital One DevExchange API Gateway, 
e Virtual Cards 

e Tokenization Platform 
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Capital One's Restful API 
& Architecture Journey 


Capital One has been investing heavily in RESTful APIs since 2014 


e There was a strong need for a gateway to serve as the single point of entry for all API traffic. 
e The Gateway handles 

— Authentication 

— Authorization 

— Rate Limiting 

— API routing 


— Custom Policies 
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By 2016 we had an opportunity to consolidate a number of legacy gateway products 


e Given our GW consolidation & migration strategy, our requirements grew complex 


XML / SOAP Vendor Restful API 


Appliance 


Legacy Service Bus Gateway 


e 12 external options were evaluated 
— 8 options were eliminated based prior to load testing 


— 4 commercial & open source gateways were load tested head to head 
e In addition, we evaluated Rohit's Prototype 


e We selected our home-grown solution based on features, performance, resiliency and scalability 
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At first | didn't believe Rohit, so | did my own testing 
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Experimental API to test out the relative performance 


Language 


(Framework) 


NodeJS 
(ExpressJS) 

Java 

(Spring Boot) 

Go 

(Standard Libraries) 


Lua JIT with NGINX 


Multitasking Model | Average Throughput 
Used 


Single Threaded -12K TPS 
Event Loop 

Multi-threaded =15K TPS 
GO Routines =95K TPS 


Single Threaded -97K TPS 
Event Loop 


Users / Devices 


API Client Apps 


and Stream 
Producers 


API Platform 


Other Internal Systems 


Protected APIs 


We defined our Architecture / Design Principles to ensure we can meet our high NFRs 


* Leverage ACID transactions only where required and avoid them where possible 


* Make systems stateless or leverage Immutable data that is safe to cache 


indefinitely 
e Separate reads from writes 
e Partition or Shard Data to meet SLAs 


* Micro-batch processing 
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We Leverage ACID* Transactions Only Where Required and Avoid Them Where Possible 


* Ensuring data consistency is hard: 
* Data replication and coordination takes time 
e Examples Requiring ACID Properties: 


— |ssuing Virtual Credit Card Numbers, Issuing New Tokens 
& Coordinating API changes 


e Examples that don't require ACID Properties: 
— Logging, Reading of Immutable Data/Tokens 
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We Make Systems Stateless or Leverage Immutable Data that is Safe to Cache Indefinitely 


* Many API Gateways store a copy of Access Tokens in a database 


e The Token Lifecycle can be broken into 2 pieces to make it scale better: 
— DevExchange Gateway lssues Stateless JWE Access Tokens 


— Revoking an Access Token can still be Accomplished with a token blacklist 


* For the Tokenization Use Cases are immutable and can be cached permanently on each server 
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We Separate Reads from Writes to Scale 


Workload: 98% Reads / 2%Writes 
e Separating Reads and Writes can Allow them to be scaled differently NEN 


without inhibiting the other operation 


e For the Tokenization use case: 


— The relationship between Tokens and Original Values are cached on every 
machine 


— Creating new tokens requires ACID transactions and uses RDS underneath with 
out of region encrypted read replicas 
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Partition or Shard Data to meet SLAs 


e Partitioning or sharding data can: 
— Spread the load 
— Guarantee cache availability 
— Ensure consistent performance 


* Partitioning can be managed manually or provided 
by the Storage Platform 


e Tokenization Use Case: 


— Data is partitioned based on field type (Separate Caches 
and RDBMs) 
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Performance Testing 
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Load Balancing 


Authentication 


Request Response 
Transformation 


Cloud Automation 


LuaJIT 


Routing REST/SOAP Throttling 


OAuth 2.0 / 
Token Inactivity 
JWE / JWT 


IP White / 
Scriptable Policies 
Black List 


Logging 


Tech Stack 


HTTP 1.0/2.0 
SSL / TLS1.1/1.2 


Token Revoke List 


HSM Integration 


Caching 


AWS-RDS 
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Reusable Modules / 
Libraries 


Build Decisions Application 


DevExchange Gateway 
( <1ms latency, 45,000+ TPS ) 


Data Tokenization 


Platform 
(<100ms latency, 2.5M RPS ) 


Virtual Payment Cards 
(«10ms latency) 
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Our NGINX stack has enabled us to meet and exceed all of our expectations 


e DevExchange Gateway 
— 22 billion transactions per day 
— 45,000 transactions per second (peak) 
— « 1ms latency (Average) 


e Data Tokenization Platform 
— 4+ Billion records 
— 3+Terabyte of data 
— 12 billion operations per day 
— 2.5 million operations per second (peak) 
— 20 — 40ms latency (Average) 


* Virtual Payment Cards 


— «2 ms latency (Average) 
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Ihank You 
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